Accuracy of Automatic Cross-Corpus Emotion Labeling for Conversational Speech Corpus Commonization
نویسندگان
چکیده
There exists a major incompatibility in emotion labeling framework among emotional speech corpora, that is, category-based and dimension-based. Commonizing these requires inter-corpus emotion labeling according to both frameworks, but doing this by human annotators is too costly for most cases. This paper examines the possibility of automatic cross-corpus emotion labeling. In order to evaluate the effectiveness of the automatic labeling, a comprehensive emotion annotation for two conversational corpora, UUDB and OGVC, was performed. With a state-of-the-art machine learning technique, dimensional and categorical emotion estimation models were trained and tested against the two corpora. For the emotion dimension estimation, the automatic cross-corpus emotion labeling for the different corpus was effective for the dimensions of aroused-sleepy, dominant-submissive and interested-indifferent, showing only slight performance degradation against the result for the same corpus. On the other hand, the performance for the emotion category estimation was not sufficient.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملSelecting Training Data for Cross-Corpus Speech Emotion Recognition: Prototypicality vs. Generalization
We investigate strategies for selection of databases and instances for training cross-corpus emotion recognition systems, that is, systems that generalize across different labelling concepts, languages and interaction scenarios. We propose objective measures for prototypicality based on distances in a large space of brute-forced acoustic features and show their relation to the expected performa...
متن کاملGrounding Emotions in Human-Machine Conversational Systems
In this paper we investigate the role of user emotions in human-machine goal-oriented conversations. There has been a growing interest in predicting emotions from acted and non-acted spontaneous speech. Much of the research work has gone in determining what are the correct labels and improving emotion prediction accuracy. In this paper we evaluate the value of user emotional state towards a com...
متن کاملAutomatic labeling of Japanese prosody using j-toBI style description
Speech corpora with prosodic labels are getting more and more important not only for speech synthesis but also for discourse modeling. A widely used labeling system for Japanese prosody, J-ToBI, however, is insufficient for applications like discourse modeling and it even lacks an accurate method for automatic labeling. In this paper, we propose an automatic labeling method for J-ToBI style des...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016